Introduction

Shiny scExplorer is a tool for analyzing single-cell RNA-seq data collected from acute myeloid leukemia patients. The app can be used to visualize gene expression data and surface protein data to determine how patterns in expression vary between patients, between cell clusters, between response categories (sensitive vs. resistant patients), and between other metadata categories. The app also shows genes differentially expressed between metadata categories as an interactive table that can be grouped by different categories. Plots and tables in the app are created using the Seurat R package.

Creating Plots from Data

To create plots from the data, navigate to the plots tab. Plots will display in the main window according to the options specified in the bar on the left. At the top of the bar, you will see switches corresponding to each type of plot; check the box next to the corresponding plot type to view it in the main window. When a plot type is added, a tab with options specific to the plot will appear in the options bar. The tab may be opened or closed by clicking its header.

Feature Selection

If feature, violin, or dot plots are selected, a textbox will appear asking for features to include on the plots. Both genes and surface protein markers can be added here.

To add a gene or a surface protein marker, enter its name in the box. As you type, a menu of matching genes, surface markers, or both will appear. Select the desired feature name from the list to view it on the plots.

Multiple features can be added, and selected features will appear on all plots (except for the UMAP plot, which does not support display by feature). Click the “x” icon in the feature tag to remove it; alternately, you can press backspace when the text cursor is in front of the feature tag.

Feature names not on the list do not exist in the data being analyzed, though they may be present under an alternate name. The list may not immediately appear while the plots are updating; if this happens, please wait a few seconds for the updates to finish and the list to load. Feel free to contact us if the list does not load at all or takes too long to display options.

Separate features for dot plots

Dot plots are well-suited for visualizing larger numbers of features. To add additional features to the dot plot, check the “use separate features for dot plot” checkbox. A separate feature entry text box will appear, and features added in this box will only appear on the dot plot.

Grouping Plots by a Variable

UMAP, violin, and dot plots can be grouped by a given metadata type to aid in identifying trends based on the metadata variable. For UMAP plots, cells are colored based on their corresponding groups; for violin plots, a separate distribution is displayed for each group, and for dot plots, one dot is displayed per group for each gene. Feature plots do not support grouping since they are already colored according to the expression values of a specified gene. By default, all plots are grouped by clusters; this can be changed by opening the tab of specific options for the desired type and selecting a different option from the “metadata to group by” dropdown menu. Please see below for a detailed explanation of the metadata variables available.

Metadata Choices

Clusters: Cell clusters determined by Seurat and ClustifyR. Clusters generally group cells by type (primitive populations, monocytic populations, etc.), though the accuracy of clustering is not perfect and some cells may be incorrectly assigned.

Response: Description of the patient’s response to treatment (S= Sensitive, R= Resistant).

Treatment: The treatment assigned.

Patient ID: The unique identifier associated with each patient included in the dataset. Grouping by this variable will display data for each unique patient.

Splitting Plots by a variable

Metadata may also be used to “split” plots: setting a split by variable will divide the data into multiple plots based on the chosen metadata variable. For instance, splitting the data by response will create two plots, one showing the cells from resistant patients, and the other showing the cells from sensitive patients. Splitting can be performed on UMAP, feature, and violin plots. For UMAP and feature plots, selecting a split by variable will create a separate plot for each possible value in the variable (if there are two response categories, two separate plots will be created if “response” is chosen, and if there are 15 clusters, 15 separate plots will be created if “clusters” is chosen (not recommended for UMAP and feature plots)). For vioin plots, selecting a split by variable will divide the groups on the plot into subgroups based on the values of the split by variable. To split the metadata by a variable, open the tab of options specific to the desired plot type and select a variable from the “metadata to split by” dropdown menu. In the example below, selecting “clusters” as the split by variable, “response” as the group by variable, and “BCL2” as the selected feature segments the response groups into clusters, showing expression values for BCL2 in each cluster of cells from sensitive and resistant patients.

Downloading Plots

To download a plot, navigate to the tab of options for that plot, and click the download button at the bottom of the tab. The plot will be saved to your computer as an image in a .png format.

Changing the size of plots

The height and width of each plot may be adjusted by clicking “manually adjust plot dimensions” in the options tab specific to the plot. Use the sliders or the text boxes to adjust the width and height. To submit a value in the text box, press enter after entering the value in the box. For downloaded plot images, the ratio between the selected width and height will reflect the values entered, though the image size of the download will be larger to improve the quality of the downloaded image.

Differential Expression Tables

Differential expression tables give a list of the top genes differentially expressed between groups of cells based on their metadata. By default, the tables tab shows the top genes differentially expressed in resistant vs. sensitive patients (table is split by the “response” metadata variable). The tables may also be displayed by patient ID to show the top genes in each patient relative to the others: to view the data by patient, choose “patient ID” in the “select Variable to view gene expression data by menu”, and select the patient to compare in the dropdown menu beneath that. You can view data for differentially expressed surface protein markers by selecting “ADT” from the “choose assay to view” dropdown menu.

Gene Correlations

The gene correlations tab will display a list of all genes correlated with a single gene of interest. When a gene is entered, the pearson correlation coefficient will be computed between the gene entered and all other genes, and the top correlated genes will be shown in a table. The analysis can be performed for all cells in the dataset, or a subset can be chosen based on any desired combination of clusters, patient ID’s, and response categories. When the app is updated with the D0/D30 data, the ability to subset based on day of sample will be added.

Enter gene and select a subset

Use the bar on the left hand side of the screen to enter the gene of interest and specify a subset. Gene entry works the same was as in the plots tab, except that only one gene can be entered at a time. Use the dropdown menus beneath the gene entry box to specify the subset. The dropdown menus can be used to select any combination of clusters, patient ID’s, and response categories by clicking each category. You can use the “Select All” and “Deselect All” buttons at the top of the menu to facilitate your selection process. All of the data is selected by default. When you have made your desired selections, click “submit” to generate the correlation table. It should take between 15 and 45 seconds to generate the table.

Correlations Report

Once the computation is finished, a report will be generated on the main screen. Your subset selections will appear at the top of the document, and statistics for the chosen gene and subset will be displayed beneath the selections. The number of cells in your selection with non-zero reads for the selected gene is an important metric; if this is too low, the computed correlations will be unreliable. We recommend a nonzero percentage of at least 10% for optimal accuracy. Also, the number of cells in the subset should ideally be at least 50.

The correlations table will display beneath the statistics. Pearson correlation coefficients are shown for each gene: by default, the top positively correlated genes are shown in descending order. To see the top negatively correlated genes, click the arrow to the right of the “Correlation Coefficient” column header. You can use the search bar in the upper right hand corner of the table to search for a specific gene, as well as the text entry box beneath the “Correlation Coefficient” column header to filter the table for correlation coefficients within a desired range. To filter, you may either use the slider that appears when the text box is clicked, or you can enter lower and upper bounds using the format “(lower bound) … (upper bound)”. For example, “.50 … .95”. Press enter/return to update the table with the chosen criteria. The amount of entries per page can be set at the upper right, and you can navigate between pages using the buttons at the bottom of the table.

Visualizing Correlation With Scatterplot

Click on any gene in the table to view a scatterplot showing the expression of that gene vs. the expression of the gene entered on the left, to visualize the correlation between the two genes. Each point represents a cell; all cells from the chosen subset are plotted. The points are colored by the cluster the cell belongs to.